{
  "cells": [
    {
      "source": [
        "本notebook文件来源：https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html"
      ],
      "cell_type": "markdown",
      "metadata": {}
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n",
        "Neural Networks\n",
        "===============\n",
        "\n",
        "Neural networks can be constructed using the ``torch.nn`` package.\n",
        "\n",
        "Now that you had a glimpse of ``autograd``, ``nn`` depends on\n",
        "``autograd`` to define models and differentiate them.\n",
        "An ``nn.Module`` contains layers, and a method ``forward(input)`` that\n",
        "returns the ``output``.\n",
        "\n",
        "differentiate求导\n",
        "\n",
        "For example, look at this network that classifies digit images:\n",
        "\n",
        "![figure](https://pytorch.org/tutorials/_images/mnist.png)\n",
        "\n",
        "   convnet\n",
        "\n",
        "It is a simple feed-forward network. It takes the input, feeds it\n",
        "through several layers one after the other, and then finally gives the\n",
        "output.\n",
        "\n",
        "feed-forward network前馈神经网络\n",
        "\n",
        "A typical training procedure for a neural network is as follows:\n",
        "\n",
        "- Define the neural network that has some learnable parameters (or\n",
        "  weights)\n",
        "- Iterate over a dataset of inputs\n",
        "- Process input through the network\n",
        "- Compute the loss (how far is the output from being correct)\n",
        "- Propagate gradients back into the network’s parameters\n",
        "- Update the weights of the network, typically using a simple update rule:\n",
        "  ``weight = weight - learning_rate * gradient``\n",
        "\n",
        "Define the network\n",
        "------------------\n",
        "\n",
        "Let’s define this network:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Net(\n  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))\n  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))\n  (fc1): Linear(in_features=576, out_features=120, bias=True)\n  (fc2): Linear(in_features=120, out_features=84, bias=True)\n  (fc3): Linear(in_features=84, out_features=10, bias=True)\n)\n"
          ]
        }
      ],
      "source": [
        "import torch\n",
        "import torch.nn as nn\n",
        "import torch.nn.functional as F\n",
        "\n",
        "\n",
        "class Net(nn.Module):\n",
        "\n",
        "    def __init__(self):\n",
        "        super(Net, self).__init__()\n",
        "        #Conv2d是在输入信号（由几个平面图像构成）上应用2维卷积\n",
        "        # 1 input image channel, 6 output channels, 3x3 square convolution kernel\n",
        "        self.conv1 = nn.Conv2d(1, 6, 3)\n",
        "        self.conv2 = nn.Conv2d(6, 16, 3)\n",
        "        # an affine operation: y = Wx + b\n",
        "        #affine仿射的\n",
        "        self.fc1 = nn.Linear(16 * 6 * 6, 120)\n",
        "        #16是conv2的输出通道数，6*6是图像维度\n",
        "        #（32*32的原图，经conv1卷后是6*30*30，经池化后是6*15*15，经conv2卷后是16*13*13，经池化后是16*6*6）\n",
        "        #经过网络层后的维度数计算方式都可以看网络的类的文档来查到\n",
        "        self.fc2 = nn.Linear(120, 84)\n",
        "        self.fc3 = nn.Linear(84, 10)\n",
        "\n",
        "    def forward(self, x):\n",
        "        # Max pooling over a (2, 2) window\n",
        "        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
        "        # If the size is a square, you can specify with a single number\n",
        "        x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
        "        x = x.view(-1, self.num_flat_features(x))\n",
        "        #将x转化为元素不变，尺寸为[-1,self.num_flat_features(x)]的Tensor\n",
        "        #-1的维度具体是多少，是根据另一维度计算出来的\n",
        "        #由于另一维度是x全部特征的长度，所以这一步就是把x从三维张量拉成一维向量\n",
        "        x = F.relu(self.fc1(x))\n",
        "        x = F.relu(self.fc2(x))\n",
        "        x = self.fc3(x)\n",
        "        return x\n",
        "\n",
        "    def num_flat_features(self, x):\n",
        "        #计算得到x的特征总数（就是把各维度乘起来）\n",
        "        size = x.size()[1:]  # all dimensions except the batch dimension\n",
        "        num_features = 1\n",
        "        for s in size:\n",
        "            num_features *= s\n",
        "        return num_features\n",
        "\n",
        "\n",
        "net = Net()\n",
        "print(net)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You just have to define the ``forward`` function, and the ``backward``\n",
        "function (where gradients are computed) is automatically defined for you\n",
        "using ``autograd``.\n",
        "You can use any of the Tensor operations in the ``forward`` function.\n",
        "\n",
        "The learnable parameters of a model are returned by ``net.parameters()``\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "10\ntorch.Size([6, 1, 3, 3])\n"
          ]
        }
      ],
      "source": [
        "params = list(net.parameters())\n",
        "print(len(params))\n",
        "print(params[0].size())  # conv1's .weight"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's try a random 32x32 input.\n",
        "Note: expected input size of this net (LeNet) is 32x32. To use this net on\n",
        "the MNIST dataset, please resize the images from the dataset to 32x32.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "tensor([[-0.0894, -0.0754, -0.1340,  0.0693,  0.0953,  0.0937, -0.1054,  0.0527,\n          0.0066,  0.0356]], grad_fn=<AddmmBackward>)\n"
          ]
        }
      ],
      "source": [
        "input = torch.randn(1, 1, 32, 32)\n",
        "out = net(input)\n",
        "print(out)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Zero the gradient buffers of all parameters and backprops with random\n",
        "gradients:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "net.zero_grad()\n",
        "out.backward(torch.randn(1, 10))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Note\n",
        "``torch.nn`` only supports mini-batches. The entire ``torch.nn``\n",
        "    package only supports inputs that are a mini-batch of samples, and not\n",
        "    a single sample.\n",
        "\n",
        "For example, ``nn.Conv2d`` will take in a 4D Tensor of\n",
        "``nSamples x nChannels x Height x Width``.\n",
        "\n",
        "If you have a single sample, just use ``input.unsqueeze(0)`` to add\n",
        "a fake batch dimension.\n",
        "\n",
        "Before proceeding further, let's recap all the classes you’ve seen so far.\n",
        "\n",
        "**Recap:**\n",
        "  -  ``torch.Tensor`` - A *multi-dimensional array* with support for autograd\n",
        "     operations like ``backward()``. Also *holds the gradient* w.r.t. the\n",
        "     tensor.\n",
        "  -  ``nn.Module`` - Neural network module. *Convenient way of\n",
        "     encapsulating parameters*, with helpers for moving them to GPU,\n",
        "     exporting, loading, etc.\n",
        "  -  ``nn.Parameter`` - A kind of Tensor, that is *automatically\n",
        "     registered as a parameter when assigned as an attribute to a*\n",
        "     ``Module``.\n",
        "  -  ``autograd.Function`` - Implements *forward and backward definitions\n",
        "     of an autograd operation*. Every ``Tensor`` operation creates at\n",
        "     least a single ``Function`` node that connects to functions that\n",
        "     created a ``Tensor`` and *encodes its history*.\n",
        "\n",
        "**At this point, we covered:**\n",
        "  -  Defining a neural network\n",
        "  -  Processing inputs and calling backward\n",
        "\n",
        "**Still Left:**\n",
        "  -  Computing the loss\n",
        "  -  Updating the weights of the network\n",
        "\n",
        "## Loss Function\n",
        "A loss function takes the (output, target) pair of inputs, and computes a\n",
        "value that estimates how far away the output is from the target.\n",
        "\n",
        "There are several different\n",
        "[loss functions](https://pytorch.org/docs/nn.html#loss-functions) under the\n",
        "nn package .\n",
        "A simple loss is: ``nn.MSELoss`` which computes the mean-squared error\n",
        "between the input and the target.\n",
        "\n",
        "For example:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "tensor(0.7491, grad_fn=<MseLossBackward>)\n"
          ]
        }
      ],
      "source": [
        "output = net(input)\n",
        "target = torch.randn(10)  # a dummy target, for example\n",
        "target = target.view(1, -1)  # make it the same shape as output\n",
        "criterion = nn.MSELoss()\n",
        "\n",
        "loss = criterion(output, target)\n",
        "print(loss)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, if you follow ``loss`` in the backward direction, using its\n",
        "``.grad_fn`` attribute, you will see a graph of computations that looks\n",
        "like this:\n",
        "\n",
        "\n",
        "\n",
        "    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n",
        "          -> view -> linear -> relu -> linear -> relu -> linear\n",
        "          -> MSELoss\n",
        "          -> loss\n",
        "\n",
        "So, when we call ``loss.backward()``, the whole graph is differentiated\n",
        "w.r.t. the loss, and all Tensors in the graph that have ``requires_grad=True``\n",
        "will have their ``.grad`` Tensor accumulated with the gradient.\n",
        "\n",
        "For illustration, let us follow a few steps backward:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "<MseLossBackward object at 0x0000015F84F7E388>\n<AddmmBackward object at 0x0000015FFFF98E48>\n<AccumulateGrad object at 0x0000015F84F7E388>\n"
          ]
        }
      ],
      "source": [
        "print(loss.grad_fn)  # MSELoss\n",
        "print(loss.grad_fn.next_functions[0][0])  # Linear\n",
        "print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Backprop\n",
        "--------\n",
        "To backpropagate the error all we have to do is to ``loss.backward()``.\n",
        "You need to clear the existing gradients though, else gradients will be\n",
        "accumulated to existing gradients.\n",
        "\n",
        "\n",
        "Now we shall call ``loss.backward()``, and have a look at conv1's bias\n",
        "gradients before and after the backward.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "conv1.bias.grad before backward\ntensor([0., 0., 0., 0., 0., 0.])\nconv1.bias.grad after backward\ntensor([ 0.0095, -0.0128,  0.0023,  0.0062, -0.0013, -0.0055])\n"
          ]
        }
      ],
      "source": [
        "net.zero_grad()     # zeroes the gradient buffers of all parameters\n",
        "\n",
        "print('conv1.bias.grad before backward')\n",
        "print(net.conv1.bias.grad)\n",
        "\n",
        "loss.backward()\n",
        "\n",
        "print('conv1.bias.grad after backward')\n",
        "print(net.conv1.bias.grad)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we have seen how to use loss functions.\n",
        "\n",
        "**Read Later:**\n",
        "\n",
        "  The neural network package contains various modules and loss functions\n",
        "  that form the building blocks of deep neural networks. A full list with\n",
        "  documentation is [here](https://pytorch.org/docs/nn).\n",
        "\n",
        "**The only thing left to learn is:**\n",
        "\n",
        "  - Updating the weights of the network\n",
        "\n",
        "## Update the weights\n",
        "\n",
        "The simplest update rule used in practice is the Stochastic Gradient\n",
        "Descent (SGD):\n",
        "\n",
        "$weight = weight - learning\\_rate * gradient$\n",
        "\n",
        "We can implement this using simple Python code:\n",
        "\n",
        "```python\n",
        "    learning_rate = 0.01\n",
        "    for f in net.parameters():\n",
        "        f.data.sub_(f.grad.data * learning_rate)\n",
        "```\n",
        "\n",
        "However, as you use neural networks, you want to use various different\n",
        "update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.\n",
        "To enable this, we built a small package: ``torch.optim`` that\n",
        "implements all these methods. Using it is very simple:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch.optim as optim\n",
        "\n",
        "# create your optimizer\n",
        "optimizer = optim.SGD(net.parameters(), lr=0.01)\n",
        "\n",
        "# in your training loop:\n",
        "optimizer.zero_grad()   # zero the gradient buffers\n",
        "output = net(input)\n",
        "loss = criterion(output, target)\n",
        "loss.backward()\n",
        "optimizer.step()    # Does the update"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note:\n",
        "\n",
        "Observe how gradient buffers had to be manually set to zero using\n",
        "``optimizer.zero_grad()``. This is because gradients are accumulated\n",
        "as explained in the `Backprop` section.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python376jvsc74a57bd07779bbc2c1aef615d27866e0f56ee45ae126f5562611ad823fbbdf236dddc76d",
      "display_name": "Python 3.7.6 64-bit ('base': conda)"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}